Skip to content

Conversation

nnethercote
Copy link
Contributor

This shrinks Option<Symbol> from 8 bytes to 4 bytes, which shrinks
Token from 24 bytes to 16 bytes. This reduces instruction counts by up
to 1% across a range of benchmarks.

The commit introduces a new type, SymbolVec, to encapsulate the
1-indexing now required to inter-operate with the non-zero indices.

This shrinks `Option<Symbol>` from 8 bytes to 4 bytes, which shrinks
`Token` from 24 bytes to 16 bytes. This reduces instruction counts by up
to 1% across a range of benchmarks.

The commit introduces a new type, `SymbolVec`, to encapsulate the
1-indexing now required to inter-operate with the non-zero indices.
@rust-highfive
Copy link
Contributor

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 10, 2018
@nnethercote
Copy link
Contributor Author

Here is the first part of the Cachegrind diff on one benchmark. It shows lots of little improvements all over the parser, as you'd expect.

--------------------------------------------------------------------------------
Ir
--------------------------------------------------------------------------------
-47,660,021 (100.0%)  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir                    file:function
--------------------------------------------------------------------------------
-11,088,522 (23.27%)  /home/njn/moz/rustN/src/libsyntax/tokenstream.rs:<syntax::tokenstream::TokenStream as core::clone::Clone>::clone
  9,362,836 (-19.6%)  /home/njn/moz/rustN/src/libsyntax_pos/symbol.rs:syntax_pos::hygiene::SyntaxContext::as_u32
 -9,362,836 (19.65%)  /home/njn/moz/rustN/src/libsyntax_pos/symbol.rs:syntax_pos::symbol::Symbol::as_u32
 -7,706,886 (16.17%)  /home/njn/moz/rustN/src/libsyntax/ext/tt/macro_rules.rs:<syntax::ext::tt::macro_rules::MacroRulesMacroExpander as syntax::ext::base::TTMacroExpander>::expand
 -6,599,254 (13.85%)  /home/njn/moz/rustN/src/libsyntax/parse/parser.rs:syntax::parse::parser::TokenCursor::next
 -5,627,859 (11.81%)  /home/njn/moz/rustN/src/libcore/option.rs:syntax::parse::parser::TokenCursor::next
 -4,994,685 (10.48%)  /home/njn/moz/rustN/src/libsyntax/parse/parser.rs:syntax::parse::parser::Parser::next_tok
 -4,670,190 ( 9.80%)  /home/njn/moz/rustN/src/libsyntax/parse/parser.rs:syntax::parse::parser::Parser::new
 -3,918,583 ( 8.22%)  /home/njn/moz/rustN/src/libsyntax/tokenstream.rs:syntax::parse::parser::TokenCursor::next
 -3,040,491 ( 6.38%)  /home/njn/moz/rustN/src/libsyntax/ext/tt/macro_parser.rs:syntax::ext::tt::macro_parser::parse
  2,848,455 (-5.98%)  /home/njn/moz/rustN/src/libstd/collections/hash/map.rs:<T as serialize::serialize::Encodable>::encode
  2,848,455 (-5.98%)  /home/njn/moz/rustN/src/libstd/collections/hash/table.rs:<T as serialize::serialize::Encodable>::encode
 -2,485,687 ( 5.22%)  /home/njn/moz/rustN/src/libcore/option.rs:<syntax::tokenstream::TokenStream as core::clone::Clone>::clone
 -2,185,073 ( 4.58%)  /build/glibc-OTsEL5/glibc-2.27/elf/dl-lookup.c:do_lookup_x
  1,984,346 (-4.16%)  /home/njn/moz/rustN/src/libsyntax/parse/token.rs:syntax::parse::parser::TokenCursor::next
 -1,802,517 ( 3.78%)  /home/njn/moz/rustN/src/libcore/ptr.rs:syntax::parse::parser::TokenCursor::next
  1,779,398 (-3.73%)  /home/njn/moz/rustN/src/libsyntax_pos/symbol.rs:<syntax::tokenstream::TokenStream as core::clone::Clone>::clone
  1,709,073 (-3.59%)  /home/njn/moz/rustN/src/libcore/num/mod.rs:<T as serialize::serialize::Encodable>::encode
 -1,580,218 ( 3.32%)  /home/njn/moz/rustN/src/libsyntax/ext/tt/macro_parser.rs:syntax::ext::tt::macro_parser::token_name_eq
  1,580,031 (-3.32%)  /home/njn/moz/rustN/src/libsyntax/parse/token.rs:syntax::ext::tt::macro_parser::token_name_eq
 -1,558,222 ( 3.27%)  /home/njn/moz/rustN/src/libsyntax/ext/tt/quoted.rs:syntax::ext::tt::macro_parser::parse
  1,548,572 (-3.25%)  /home/njn/moz/rustN/src/libsyntax/tokenstream.rs:<syntax::ext::tt::macro_rules::MacroRulesMacroExpander as syntax::ext::base::TTMacroExpander>::expand
 -1,535,496 ( 3.22%)  /home/njn/moz/rustN/src/libcore/slice/mod.rs:<syntax::ext::tt::macro_rules::MacroRulesMacroExpander as syntax::ext::base::TTMacroExpander>::expand
 -1,533,944 ( 3.22%)  /home/njn/moz/rustN/src/libcore/option.rs:syntax::ext::tt::macro_parser::parse

@Mark-Simulacrum
Copy link
Member

@bors r+

Seems like a good optimization.

@bors
Copy link
Collaborator

bors commented Dec 10, 2018

📌 Commit cf682e0 has been approved by Mark-Simulacrum

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 10, 2018
@leonardo-m
Copy link

NonZeroU32 allows to denote one value (zero) as special, so with compiler help Option<> uses it to denote a missing value. But then you have had to implement the 1-indexing. I guess the CPU doesn't pay much those 1-subtractions caused by SymbolVec::get, but an alternative solution is to use a different value, like u32::MAX to denote the missing value for Option<>. This still restricts the u32 range by 1, but doesn't require the subtractions and the 1-indexing code.

In the D standard library ( https://dlang.org/phobos/std_typecons.html#Nullable ) there is a second version of Nullable that allows that:
struct Nullable(T, T nullValue);
Rust Integer generics should help create something like this.

/// Symbols (which are 1-indexed) index into this (which is 0-indexed
/// internally). The methods handle the index conversions.
#[derive(Default)]
pub struct SymbolVec(Vec<&'static str>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also make Symbol a newtype_index, which would prevent such index arithmetic hacks and even make Option<Option<Symbol>> and deeper nestings take up the same amount of bytes.

That would require no changes to the list of symbols, too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does newtype_index avoid the space for the tag when used within an Option?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It reserves (u32::max_value() - 255)..=u32::max_value() via compiler-internal trickery:

#[rustc_layout_scalar_valid_range_end($max)]

@oli-obk
Copy link
Contributor

oli-obk commented Dec 10, 2018

@bors r- we should discuss the way this optimization is implemented

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Dec 10, 2018
@petrochenkov petrochenkov self-assigned this Dec 10, 2018
@petrochenkov
Copy link
Contributor

I tried this previously (#28657) and found the difference to be lost in noise, but I didn't have proper benchmarks.

@petrochenkov
Copy link
Contributor

If the goal is to shrink Token specifically, then Option<Symbol> can be replaced with Symbol with keywords::Invalid.name() as None.
There are multiple places do that already.

@petrochenkov
Copy link
Contributor

Reserving a slot in the end of integer range instead of 0 (#56662 (comment)) seems like the best solution.
Perhaps then we could do the opposite of #56662 (comment) and avoid keywords::Invalid in some places.

@alexcrichton
Copy link
Member

r? @oli-obk

@petrochenkov petrochenkov self-assigned this Dec 10, 2018
@nnethercote
Copy link
Contributor Author

nnethercote commented Dec 10, 2018

If the goal is to shrink Token specifically, then Option<Symbol> can be replaced with Symbol with keywords::Invalid.name() as None.

I wrote a patch doing that a while ago. It makes the code much uglier, and spreads complexity to every piece of code that touches Lit. I strongly prefer this PR's approach.

@nnethercote
Copy link
Contributor Author

I opened a new PR (#56699) for the newtype_index! approach.

@nnethercote nnethercote deleted the NonZero-Symbol branch December 12, 2018 04:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants